The frequency of defective genomes in Omicron differs from that of the Alpha, Beta and Delta variants

The SARS-CoV-2 Omicron variant emerged showing higher transmissibility and possibly higher resistance to current COVID-19 vaccines than other variants dominating the global pandemic. In March 2020 we performed a study in clinical samples, where we found that a portion of genomes in the SARS-CoV-2 viral population accumulated deletions immediately before the S1/S2 cleavage site (furin-like cleavage site, PRRAR/S) of the spike gene, generating a frameshift and appearance of a premature stop codon. The main aim of this study was to determine the frequency of defective deletions in prevalent variants from the first to sixth pandemic waves in our setting and discuss whether the differences observed might support epidemiological proposals. The complete SARS-CoV-2 spike gene was deeply studied by next-generation sequencing using the MiSeq platform. More than 90 million reads were obtained from respiratory swab specimens of 78 COVID-19 patients with mild infection caused by the predominant variants circulating in the Barcelona city area during the six pandemic waves: B.1.5, B.1.1, B.1.177, Alpha, Beta, Delta, and Omicron. The frequency of defective genomes found in variants dominating the first and second waves was similar to that seen in Omicron, but differed from the frequencies seen in the Alpha, Beta and Delta variants. The changing pattern of mutations seen in the various SARS-CoV-2 variants driving the pandemic waves over time can affect viral transmission and immune escape. Here we discuss the putative biological effects of defective deletions naturally occurring before the S1/S2 cleavage site during adaption of the virus to human infection.

Amplicon A78, running from nt 1905 to 2260 in the spike protein (aa636-aa753), is of particular interest because it includes the S1/S2 O-linked glycan residue polybasic TMPRSS2 cleavage site (aa685-aa686). We obtained coverages of 10,346 to 215,520 reads per amplicon (Supplementary Table S2) for A78, and it had the narrowest interquartile range (IQR) compared to the other spike regions (Supplementary Fig. S1). Comparison of coverage between amplicons showed no significant differences, with IQRs lower than 0.242 in all cases except for A82 and A84 (Supplementary Table S3).
As to the relative frequency of the variants found (number of defective reads per amplicon divided by total number of reads per amplicon) in all patients and in the whole spike gene, the largest percentage of variants with defective genomes were seen in the first and second pandemic waves (Tables 1, 2 and 3) and in Omicron ( Table 7). The most prevalent defective deletion was Δ246R-249L in 7.03% of Omicron patients (Table 7), Δ852N-856V in 4.12% of B.1.5 patients (Table 1), and Δ656Y-670Y in 2.02% of B.1.177 and 1.82% of B.1.5. All other defective deletions were present at frequencies below 1% (Tables 1, 2 and 3).
Of particular note, we found a significant presence of defective deletions in the Δ640S-674Y region (nt1920-2021) in amplicon A78 on the S1/S2 cleavage site in patients from the first and second waves and in Omicron patients, but not in Alpha, Beta and Delta patients ( Table 2, Supplementary Fig. S4). The absence of defective genomes at this position in Alpha, Beta and Delta was not the result of inadequate sequencing coverage or number of patients studied (Supplementary Table S4). Defective genomes close to S1/S2 appeared in variants occurring at the beginning of the pandemic and in Omicron regardless of the number of sequences studied, whereas none of the Alpha, Beta or Delta patients showed any defective genomes in this sequence (Fig. 1). Moreover, although Omicron had the lowest sequencing coverage, it showed a pattern of defective deletions similar to those of B. 1.1, B.1.5, and B.1.177.
Some specific defective deletions in the Δ640S-674Y fragment coincided in 2 or more patients infected by variants from the first and second waves. For example, the Δ656Y-670Y deletion was found in 3 patients with B.1.5 and 6 with B.1.177 (Tables 1, 2 and 3). Furthermore, Δ640S-674Y deletions were found in 4 patients with Omicron (Table 7). However, no deleted genomes in patients infected with the Alpha, Beta and Delta variants were detected in amplicon A78 (Tables 4, 5   www.nature.com/scientificreports/ even though coverage was similar among all variants studied ( Supplementary Fig. S4, Supplementary Table S4). However, we cannot exclude the presence of defective haplotypes with abundances below the 0.1% threshold. The number, type, and frequency of deleted positions coincided in 22 of the 25 (88%) samples amplified in parallel using the ARTIC and N07 1 protocols, ruling out bias caused by primer amplification (primers shown in Supplementary Table S5). In the remaining 4 samples, N07 detected deletions of 2 nucleotides that were not found with ARTIC, possibly because of the double PCR (RT-PCR-Nested) required for N07, which has a higher risk of introducing artefactual mutations, or the 2 to 5 times higher coverage in the N07 study. Interestingly, in 2 samples (patients V69S13 and V70S02), ARTIC was able to identify defective deletions that were not visualized with the N07 primers, even though ARTIC had lower coverage (81,248 and 45,575 reads) than N07 (242,697 and 240,491 reads) (Supplementary Table S6). In general, the ARTIC protocol identified a larger number of defective deletions, except for a 7-nt deletion in V70S12 and a 29-nt haplotype in V71S13, which were only seen using the N07 protocol. Our data show that results using ARTIC were concordant with those of N07, and that in the worst case, the N07 protocol might even underestimate the rate of defective deleted haplotypes.

Discussion
The arrival of Omicron (B.1.1.529) to our geographic area has changed the profile of circulating SARS-CoV-2 variants, as it is now detected in 100% of infected patients in Barcelona city (Fig. 2). This predominance suggests that Omicron has biological advantages over the other variants: higher transmissibility and likely greater resistance to the immune response acquired by current COVID-19 vaccines or overcoming a previous SARS-CoV-2 infection 8,9 . The origin of Omicron is an open question. One hypothesis is that it could have evolved and emerged from a population undergoing little surveillance before being detected. Another proposal is that it may have resulted from viral evolution in patients with long-term persistent infection, a situation reported in clinical groups with a weak immune response, such as immunocompromised patients. A third hypothesis is that Omicron may have emerged by a cross-species jump from humans to nonhuman species, subsequently spilling back to humans 13,14 . Independently of its origin, whole-genome sequencing using next-generation techniques has shown that Omicron sequences are clustered far from the millions of SARS-CoV-2 genomes uploaded to GISAID. The present study points to an additional differentiating feature of Omicron: the pattern of mutations occurring at the S1/S2 cleavage site.
A higher presence of any variant in the human population as the pandemic progressed might reflect acquisition of a certain biological advantage over previous variants [15][16][17] . The scientific community has shown special interest in VOCs because of reported evidence indicating an impact on transmissibility and immunity attributed to multiple mutations in the receptor binding domain of the spike protein 6,7 . Another type of mutation involves deletions, which are a loss of nucleotides during the replication process that can cause a change in the correct reading frame. In a previous study 1 , we found that the early lineages showed deletions across the spike gene, some of them close to the S1/S2 spike cleavage site, which, in most cases, caused a frameshift and the appearance of a premature stop codon. Using a ribosome-profiling approach in samples from Germany, Finkel et al. 18 described a deletion located in the furin-like cleavage site (TNSPRRAR, referred to in the present study as the S1/S2 cleavage site), affecting nucleotides 23,595 to 23,615, that was recurrently selected during passage of the original SARS-CoV-2 (EPI_ISL_406862) in Vero E6 cells 18 . In our analysis of 5,730,959 sequences from 78 patients, we did not find the exact deletion of that nucleotide region, which includes the amino acids PRRAR or HRRAR (Omicron). The deletions we identified occurred just before the cleavage site, specifically from nucleotide 23,555 (aa640S) to 23,580 (aa674Y). Nonetheless, the deletions found in both cases (naturally occurring and selected during Vero E6 viral cell culture) occurred in a small region of the spike, near or above the furin-like cleavage site, suggesting that this region is a hot spot for nucleotide deletions that could provide the virus with an evolutionary advantage. Postnikova et al. 19 have suggested that the presence of two tandem CGG codons (the rarest in the SARS-CoV-2 genome), coding the two consecutive Rs in the PRRAR region, could cause ribosomal pausing and even frameshifting. Translational pausing resulting from usage of an extremely rare dicodon, together with naturally occurring defective deletions close to S1/S2 that cause a frameshift with appearance of a new stop codon 1 , might suppress spike protein extension and lead to production of S1 free protein, a situation suggested in our previous study 1 . It is reasonable to postulate that the deletions and usage of rare codons indicate that this region is extremely important for the biology of the virus, enabling SARS-CoV-2 to readily infect humans. Free spike and truncated S1 protein, recently reported in plasma of severely ill COVID-19 patients, has been attributed to tissue damage resulting from severe disease 2 . In any case, this finding indicates that the complete spike protein and the truncated S1 form are soluble and can be secreted by infected cells, perhaps even in mild cases, as we suggested previously 1 .
Here we show that variants dominating the first (B.1.5, B.1.1) and second (B.1.177) pandemic waves had a large frequency of minority mutants with deletions causing defective genomes, whereas the Alpha, Beta and Delta variants, predominant in several regions of the world 5,20,21 , had a smaller presence of deleted genomes, suggesting that this situation may have favored their spread in the human population, overcoming other variants. Defective viral genomes occur in most, if not all, RNA viruses during infection 22 . Deep sequencing has shown that several species of defective genomes are generated, as was seen in samples from influenza virus-infected patients and cultures of metapneumovirus and measles virus [23][24][25] . Coronaviruses are not an exception 1,26-31 . It has been reported that defective genomes can interfere with wild-type viral infection, causing a reduction of virulence in vivo 32,33 , and there are many examples of defective genomes having an impact on human and animal health as a part of viral evolution and adaptation to the host 22 .
The Alpha, Beta and Delta variants appeared at a time when there was a high incidence of new infections, and they had to compete with other variants. The success of these lineages suggests that they had acquired a biological advantage to beat their competitors and dominate the pandemic wave. Our results show that Alpha, Beta, and www.nature.com/scientificreports/ Delta had no defective deletions in the spike region close to the S1/S2 cleavage site and much fewer defective deletions than the B.1.5, B.1.1, or B.1.177 lineages. A smaller presence of defective genomes might be associated with a greater presence of infectious viral particles, without affecting viral load (measured using the surrogate Ct value). With fewer defective genomes, there would be less circulating free SI and the putative competition of free S1 with infecting particles would decease or fail, with one outcome being greater infectivity. In addition, a significant reduction in defective genomes could be associated with longer duration of positive PCR testing and greater disease severity. It has been reported that the time interval from symptoms onset to viral load decrease is longer in infection by Delta than by other variants, and that Delta infection is associated with a higher risk of severe disease in unvaccinated people, and a greater need for remdesivir or corticosteroid treatment 34 . In their publication, the CDC concluded that Delta was more contagious than previous variants 35 and a study from Ontario, Canada reported that there was a pronounced increase in hospitalization, ICU admission, and death during Delta variant dominance, particularly in unvaccinated people 36 . Surprisingly, Omicron has shown a frequency of minority mutants with deletions more similar to variants dominating the first and second waves than to Alpha, Beta and Delta. Delta and its subvariants accounted for the vast majority of infections up to the end of 2021, and it was expected that one of these subvariants would ultimately predominate over others and become endemic and seasonal. In this scenario, the appearance of Omicron was unexpected and surprising, and its rise and dominance over Delta suggests that Omicron has higher fitness than other variants. The Omicron genome includes mutations that help the virus overcome the host's immune response and spike gene mutations and deletions affecting the receptor binding domain and the S1/S2 cleavage site (HRRAR in Omicron) that increase host cell ACE2 receptor recognition. The explosion of Omicron, which at this time is the most prevalent variant all over the world, may also be explained by its tropism for the upper respiratory tract, which facilitates viral transmission 37 . Nonetheless, despite its dominance over Delta, Omicron subvariants BA.1 and BA.1.1 have not led to an increased severity of the infection (hospital admissions to ICU units) or mortality 38 .
In conclusion, the frequency of concomitant defective deletions close to the S1/S2 cleavage site was higher in dominant variants seen at the beginning of the pandemic than in the Alpha, Beta and Delta variants, suggesting that these mutations may contribute to adapting SARS-CoV-2 to human infectivity. The observation obtained here that the presence of mutations in Omicron is similar to that of variants at the beginning of the pandemic provides information that could be of value when studying the evolution of Omicron subvariants spreading among the human population. Our results concur with findings from previous studies indicating that the S1/ S2 cleavage site is an important region for the biology of the virus, affecting the capability of SARS-CoV-2 to readily infect humans.

Materials and methods
Patients and methods. Naso/oropharyngeal swab samples were collected from laboratory-confirmed SARS-CoV-2 patients meeting the World Health Organization (WHO) definition 39  The patients' demographic data (sex and age) were obtained retrospectively. Clinical data, sample extraction data, cycle threshold (Ct) value (when measured), and symptoms are specified in Supplementary Table S1. Only samples with Ct values lower than 30 (with some exceptions) were included in the whole-genome sequencing weekly analysis 5 . Only samples from mild or asymptomatic patients were selected for study because this group showed the highest prevalence of deletions at the beginning of the pandemic 1 . Institutional review board of Clinical Research Ethics Committee (CEIm) Vall d'Hebron University Hospital approval was obtained(PR(AG)259/2020). The need for informed consent was waived by CEIm Vall d'Hebron University Hospital, as study used routinely collected for surveillance activities. All methods were performed in accordance with the relevant guidelines and regulations.
SARS-CoV-2 detection is described in the Supplementary Material for this study and in Andres et al. 1 . To sequence the whole spike gene, we mainly used ARTIC v3 primers including the 21,658 bp to 25,673 bp position, corresponding to the nCoV-2019_72 (A72) to nCoV-2019_84 (A84) overlapping amplicons (artic28-ncov2019/ nCoV-2019.scheme.bed, ARTIC Network), and reformulated the ones that gave amplification problems due to deletions. Spike pair and impair primers were tested for efficacy and mixed in 2 different pools for the 2 posterior multiplexed PCRs. Each pool was optimized by adjusting primer concentration to obtain a balanced number of reads for each amplicon.
The bioinformatics analysis was done as reported by Andres et al. 1 , and the Wilcoxon test was used to compare Ct values between samples with and without emergent deletions. Pearson's chi-square test was used to compare the emergence of deletions with sex or presence of COVID-19 disease symptoms (eg, high fever, anosmia, ageusia, persistent headache).
Rationale for the bioinformatics analysis used. Conventional bioinformatics analyses can be carried out using several tools and platforms. In terms of sequence quality control, the analysis is performed with tools that can analyze various quality metrics and correct these if necessary. For example, in normal conditions, sequence trimming can be performed with the trimmomatic 40  www.nature.com/scientificreports/ sequences, nucleotides, primers, and adapters, among others. However, this approach could not be applied in our analysis, as reads would be shortened according to their quality and ultimately have different lengths, making proper analysis and comparison impossible. With the use of conventional pipelines, once the reads are trimmed, you would map them against a reference to perform variant calling, where you would be able to see all changes. Several tools can be used for this type of analysis, such as bwa 42 , bowtie2 43 or bbmap 44 for alignment, and samtools or BCF tools 45 for variant calling, among others. However, our aim was to detect deletions with specific characteristics according to the haplotypes present, which cannot be achieved using conventional pipelines, as this would result in the identification of all changes according to the reference.
In addition, analysis of the variants would lead to the consensus sequence, which was not the aim of our pipeline, as we were interested in studying the diversity of haplotypes.

Data availability
The datasets generated and/or analysed during the current study are available in the GenBank repository, Gen-Bank Sequence Read Archive (SRA) database with BioProject accession number PRJNA788442, and sequences from Omicron with accession number SUB11151740.